1.7 Confidence Intervals via Normal Approximation

confidence interval（信頼区間）

https://ja.wikipedia.org/wiki/信頼区間

a confidence interval around this estimate would not only be more informative and desirable in certain applications, but our point estimate could be quite sensitive to the particular training/test split

「モデルの汎化性能の信頼区間は特定の応用においてより情報に富み望ましいだけでなく、私たちの点推定を特定の訓練/テストセットの分割に対して感度をかなり高くする」

normal approximation

モデルのpredictive accuacyまたはerrorの信頼区間を計算する単純なアプローチ

compute the confidence interval on the mean on a single training-test split under the central limit theorem

「中心極限定理のもとで、1回の訓練-テスト分割の平均の信頼区間を計算する」

式 (17): 手元にあるaccuracyにテストセットのサンプル数を使ってaccuracyの信頼区間を算出

気づき：accuracyの信頼区間を求めるのは、1.6 Pessimistic Biasで指摘した汎化性能が分からないへのアプローチということか！

感想：正規分布への近似のことを言っていそう（用語としては「正規近似」？）

データセットS（ここではテストセット）、サイズはn

SのACCを求める（式 (10)）

クロネッカーのデルタ（予測が正解と一致したらサンプルでは1）のサンプルに渡る総和の平均値

サンプルの予測をベルヌーイ試行と考える

正解の予測の数Xは二項分布 (n, p) に従う

テストサンプル数n

試行回数k

成功(※)確率p （※ 予測が正解する）

https://ja.wikipedia.org/wiki/二項分布

成功の期待値はnp (=μ)

例：50%で成功、40サンプルであれば、成功する期待値は20

期待値npという見積りはnp(1-p)というvarianceを持つ

accuracyの見積もりのvariance (the variance of the accuracy estimate)（式 (15)）

私たちは成功の数Xの平均に興味がある（abosolute valueでなく）

Xの平均のvarianceはp(1 − p)/n

ref: Wikipedia 二項分布正規分布への近似

p→ACCに置き換わっている

（TODO もしかすると大学の教科書で聞いたことのある話かもしれない）

I would rather recommend repeating the training-test split multiple times to compute the confidence interval on the mean estimate

著者の意見として「平均の推定について信頼区間を計算するため、訓練-テストの分割を複数回繰り返すことをむしろオススメする」

having fewer samples in the test set increases the variance (see n in the denominator above) and thus widens the confidence interval.

「テストセットに少ないサンプルを持つほどvarianceは増加し、従って信頼区間は広がる」

テストセットのサンプル数nを大きくすれば、式(15)で求まるvarianceは小さくなる（nは分母）

varianceが小さいということは信頼区間が狭い